16 research outputs found

    Ensemble Committees for Stock Return Classification and Prediction

    Full text link
    This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a non-linear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate Research prize - second plac

    Information-Theoretic Limits for Density Estimation

    Get PDF
    This paper is concerned with the information-theoretical limits of density estimation for Gaussian random variables with data drawn independently and with identical distributions. We apply Fano\u27s inequality to the space of densities and an arbitrary estimator. We derive necessary conditions on the sample size for reliable density recovery and for reliable density estimation. These conditions are true simultaneously for both finitely and infinitely dimensional density spaces

    Essays on Numerical Integration in Hamiltonian Monte Carlo

    Get PDF
    This thesis considers a variety of topics broadly unified under the theme of geometric integration for Riemannian manifold Hamiltonian Monte Carlo. In chapter 2, we review fundamental topics in numerical computing (section 2.1), classical mechanics (section 2.2), integration on manifolds (section 2.3), Riemannian geometry (section 2.5), stochastic differential equations (section 2.4), information geometry (section 2.6), and Markov chain Monte Carlo (section 2.7). The purpose of these sections is to present the topics discussed in the thesis within a broader context. The subsequent chapters are self-contained to an extent, but contain references back to this foundational material where appropriate. Chapter 3 gives a formal means of conceptualizing the Markov chains corresponding to Riemannian manifold Hamiltonian Monte Carlo and related methods; this formalism is useful for understanding the significance of reversibility and volume-preservation for maintaining detailed balance in Markov chain Monte Carlo. Throughout the remainder of the thesis, we investigate alternative methods of geometric numerical integration for use in Riemannian manifold Hamiltonian Monte Carlo, discuss numerical issues involving violations of reversibility and detailed balance, and propose new algorithms with superior theoretical foundations. In chapter 4, we evaluate the implicit midpoint integrator for Riemannian manifold Hamiltonian Monte Carlo, presenting the first time that this integrator has been deployed and assessed within this context. We discuss attributes of the implicit midpoint integrator that make it preferable, and inferior, to alternative methods of geometric integration such as the generalized leapfrog procedure. In chapter 5, we treat an empirical question as to what extent convergence thresholds play a role in geometric numerical integration in Riemannian manifold Hamiltonian Monte Carlo. If the convergence threshold is too large, then the Markov chain transition kernel will fail to maintain detailed balance, whereas a convergence threshold that is very small will incur computational penalties. We investigate these phenomena and suggest two mechanisms, based on stochastic approximation and higher-order solvers for non-linear equations, which can aid in identifying convergence thresholds or suppress its significance. In chapter 6, we consider a numerical integrator for Markov chain Monte Carlo based on the Lagrangian, rather than Hamiltonian, formalism in classical mechanics. Our contributions include clarifying the order of accuracy of this numerical integrator, which has been misunderstood in the literature, and evaluating a simple change that can accelerate the implementation of the method, but which comes at the cost of producing more serially auto-correlated samples. We also discuss robustness properties of the Lagrangian numerical method that do not materialize in the Hamiltonian setting. Chapter 7 examines theories of geometric ergodicity for Riemannian manifold Hamiltonian Monte Carlo and Lagrangian Monte Carlo, and proposes a simple modification to these Markov chain methods that enables geometric ergodicity to be inherited from the manifold Metropolis-adjusted Langevin algorithm. In chapter 8, we show how to revise an explicit integration using a theory of Lagrange multipliers so that the resulting numerical method satisfies the properties of reversibility and volume-preservation. Supplementary content in chapter E investigates topics in the theory of shadow Hamiltonians of the implicit midpoint method in the case of non-canonical Hamiltonian mechanics and chapter F, which treats the continual adaptation of a parameterized proposal distribution in the independent Metropolis-Hastings sampler

    On Opportunity Cost Bounds for the Knowledge Gradient

    Get PDF
    We prove an upper bound on the cumulative opportunity cost of the online knowledge gradient algorithm. We leverage the theory of martingales to yield a bound under the Gaussian assumption. Using results from information theory we are further able to provide asymptotic bounds on the cumulative opportunity cost with high probability

    Communication Complexity of Distributed Statistical Algorithms

    Get PDF
    This paper constructs bounds on the minimax risk under loss functions when statistical estimation is performed in a distributed environment and with communication constraints. We treat this problem using techniques from information theory and communication complexity. In many cases our bounds rely crucially on metric entropy conditions and the classical reduction from estimation to testing. A number of examples exhibit how bounds on the minimax risk play out in practice. We also study distributed statistical estimation problems in the context of PAC-learnability and derive explicit algorithms for solving classical problems. We study the communication complexity of these algorithms

    Optimistic and Parallel Ising Model Estimation

    Get PDF
    We consider a new method for estimating the structure of Ising graphical models from data. We assume that the data is observed with error, so that it is, in a sense, unreliable. We propose and investigate an ``optimistic\u27\u27 estimator; that is, an approach that seeks to correct the log-likelihood objective function when some amount of the data is known to be mismeasured. We derive an interior point algorithm that constructs our estimator efficiently, and demonstrate that it leads naturally to a parallel procedure for recovering the graphical structure of Ising models. We show that the optimistic estimator has performance comparable to, and exceeding, regularized logistic regression in the presence of noise
    corecore